[WIP] Support Prism by kzkn · Pull Request #347 · ruby-formatter/rufo

kzkn · 2025-03-01T03:03:22Z

Strategy

Swap the parsing backend from Ripper to Prism. The chosen approach is a full rewrite of the formatter (Rufo::PrismFormatter in lib/rufo/prism_formatter.rb), not a parser-only swap.

Rejected alternative: keep lib/rufo/formatter.rb (4221 lines) and adapt it to consume Prism AST nodes in place of Ripper sexp.

Reasons:

Prism AST shape differs significantly from Ripper sexp: named fields vs. positional arrays, different node decomposition for heredocs, comments, and operator calls.
The existing formatter is intertwined with Ripper's token stream and recovery quirks (lex/sexp dual access, sexp_unparsable_code fallback, token-position-driven comment placement).
A rewrite forces a clean re-evaluation of each formatting decision (alignment options, indent semantics, comment interleaving) against the new AST instead of patching old assumptions.
Cost is paid once; the result is a smaller engine whose state can be reasoned about node by node.

Both engines coexist via engine: :prism on Rufo.format. The default remains Ripper until PrismFormatter reaches feature parity. Default-flip criteria (to be confirmed): all checklist files migrated, no behavior regression on a sample corpus, no significant performance regression.

Architecture

lib/rufo/prism_formatter.rb — visitor-based formatter over Prism AST.
Source-offset cursor (@source_offset) drives comment interleaving: comments live in parse_result.comments (outside the AST), so the visitor drains them before each node enters output, classifying standalone vs trailing by inspecting the source line.
Layout primitives: write, write_newline, indent_by, with a pending-indent flag so nodes do not need to know their own indent.
Heredoc queue: <<EOS opening is written in place; body and closing are appended on the next write_newline (their source locations live separately from the opening's).
Non-fatal Prism errors: a small whitelist (:invalid_block_exit, :invalid_retry_without_rescue) is treated as semantic-validity issues rather than parse failures, matching the legacy formatter's behavior on syntactically-valid-but-semantically-invalid code.

Spec sync policy

spec/lib/rufo/prism_formatter_source_specs/ mirrors the layout of spec/lib/rufo/formatter_source_specs/. Verbatim copy when every case passes; hand-curated subset when only some pass; no file when the topic is not yet supported. The header comment in spec/lib/rufo/prism_formatter_spec.rb documents the sync workflow.

Progress

The previous implementation wrote the message before the receiver, which produced "fooa" for "a.foo". Branch on call_operator_loc to distinguish method calls (receiver, op, message) from unary prefix operators (message, receiver). Add a regression spec covering the simple dot, chained dots, and safe navigation cases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

DEBUG was hard-coded to true and printed `p [:debug, ...]` on every parse, polluting test output. Read RUFO_PRISM_DEBUG from the env so local debugging stays opt-in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Prism exposes comments outside the AST (parse_result.comments), so the visitor needs to drain them at the right points to preserve order against AST nodes. Track @source_offset (position past the last source bytes already accounted for). write_code_at now drains comments before the location, writes the source slice, and advances the cursor. visit_statements_node drains before each child as well, so standalone comments above a statement land on their own line. PrismFormatter#format calls visitor.finish at the end to drain remaining comments past the last statement. emit_comment classifies each comment by what precedes it on its source line: whitespace-only -> standalone (own line at current position), otherwise trailing (preserve the gap from the previous emitted source position). visit_nil/true/false/instance_variable_read switch to write_code_at(node.location) for consistent cursor tracking; visit_call_node now uses write_code_at(message_loc) for the same reason. visit_local_variable_write_node and visit_undef_node manually consume up to their start so leading standalone comments are not skipped. Adds a comments.rb.spec covering the cases that do not require blank-line preservation: single standalone, two consecutive standalones, trailing-after-code, standalone-before-code, and trailing-then-standalone. Blank-line preservation between comments needs layout state (next step) and is left for then. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@output

Replace bare @output << x with a small write API: - write: appends content and emits the pending indent if we are at the start of a line. - write_newline / write_newline_unless_pending: ends a line, marks indent as pending for the next write. - indent_by(amount) { ... }: scoped indent push. write_code_at flows through write, so cursor tracking and indent emission stay consistent for the existing literal visitors. emit_comment uses write/write_newline too, so standalone comments land at the current indent. Add visit_if_node as the first non-literal node to validate the state machine: predicate -> newline -> indent_by(INDENT_SIZE) -> body -> newline -> end. Spec covers indent introduction on a flat body, preservation of an already-indented body, double indent for nested if, and an empty body. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Prism gives heredocs three locations: opening (<<EOS), content (body), and closing (the EOS terminator line). The node's location only spans the opening — the body and closing live later in source on their own lines. Detect heredocs in visit_string_node by the opening_loc.slice prefix (<<). Write the opening verbatim, push the node onto a @pending_heredocs queue, and advance the source cursor past closing_loc.end_offset so the cursor stays consistent with emit position. write_newline flushes the queue right after emitting "\n", which is the correct insertion point: the body must follow the line that opened the heredoc. finish() also flushes so a trailing heredoc at EOF is not lost. The body/closing are copied verbatim from source, which is the correct behavior for all three forms (<<, <<-, <<~) because Prism's location ranges already encode the desired indentation. Spec covers bare, assigned, dash, and squiggly forms, plus a heredoc followed by a statement and one followed by a standalone comment then a statement. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

DEBUG and both debug_log methods were development-only print helpers. The env-gated form added in 6a was still dead weight in the shipped formatter; drop the constant, the two method definitions, and the lone call site. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

RedoNode / RetryNode are bare keywords — write_code_at(node.location). AliasMethodNode and AliasGlobalVariableNode share the same shape (keyword, new_name, old_name with single spaces); share a private visit_alias helper. Prism flags top-level redo/retry as :syntax-level errors even though it still builds a complete AST (semantic validity, not parse failure). The existing Ripper-based formatter formats these inputs fine. Add a NON_FATAL_ERROR_TYPES list so PrismFormatter skips :invalid_block_exit and :invalid_retry_without_rescue when deciding whether to raise. Other :syntax errors (e.g. dynamic constant assignment in def) still raise, preserving the existing behavior. Specs are copied verbatim from formatter_source_specs/. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

visit_unless_node mirrors visit_if_node but also dispatches the optional else_clause (an ElseNode) before writing end. Add visit_else_node as a shared helper that emits "else", a newline, the indented body, and a trailing newline; visit_if_node will reuse it once elsif chains are designed. Both unless cases from formatter_source_specs (with and without an empty else) format and round-trip correctly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Safe-navigation cases (foo &. bar / foo&. bar) round-trip through the existing visit_call_node with no new code. XStringNode, RegularExpressionNode, and InterpolatedRegularExpressionNode are all "copy node.location verbatim" cases for the topics covered in the specs; add the three visitor methods and copy the corresponding spec files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Add a header comment explaining the relationship between prism_formatter_source_specs/ and formatter_source_specs/: verbatim copies when fully supported, hand-curated subsets when partial, no file when unsupported. Syncing is manual until PrismFormatter approaches parity and a per-engine PENDING marker can replace the dual-directory arrangement. This is the lower-impact half of the original 6b plan. A true consolidation (single spec dir + per-engine pending markers) would require extending the spec parser and is deferred until the divergence between engines shrinks. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Remove unused code_at — every caller already uses write_code_at. - Drop the amount parameter on indent_by and rename to indent. The only argument ever passed was the INDENT_SIZE constant; pull the constant inside the helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

kzkn and others added 30 commits March 1, 2025 11:59

add PrismFormatter

a60da63

add prism as a development dependency

dd63643

add nil.rb.spec

3ab9620

add chars.rb.spec

c29bfc8

add integers.rb.spec

619d92b

add class_variables.rb.spec

34cc82d

add leading_newlines.rb.spec

3869d51

add rationals.rb.spec

e65f2a9

add imaginaries.rb.spec

8e0a37b

add spaces_inside_hash_brace.rb.spec

990722e

add floats.rb.spec

367288c

add booleans.rb.spec

264c0fa

add special_global_variables.rb.spec

0627c5f

add symbol_literals.rb.spec

2273743

format

7d4839e

traverse ast output by prism

fc62638

variables

6f2a1a8

undef

1c65793

unary operator

1b2c006

Merge remote-tracking branch 'origin/master' into prism

bd11cdc

gate prism formatter DEBUG output behind env var

8478da4

DEBUG was hard-coded to true and printed `p [:debug, ...]` on every parse, polluting test output. Read RUFO_PRISM_DEBUG from the env so local debugging stays opt-in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Support Prism#347

[WIP] Support Prism#347
kzkn wants to merge 31 commits into
masterfrom
prism

kzkn commented Mar 1, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kzkn commented Mar 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Strategy

Architecture

Spec sync policy

Progress

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kzkn commented Mar 1, 2025 •

edited

Loading